862 research outputs found
Useful Blunders: Can Automated Speech Recognition Errors Improve Downstream Dementia Classification?
\textbf{Objectives}: We aimed to investigate how errors from automatic speech
recognition (ASR) systems affect dementia classification accuracy, specifically
in the ``Cookie Theft'' picture description task. We aimed to assess whether
imperfect ASR-generated transcripts could provide valuable information for
distinguishing between language samples from cognitively healthy individuals
and those with Alzheimer's disease (AD).
\textbf{Methods}: We conducted experiments using various ASR models, refining
their transcripts with post-editing techniques. Both these imperfect ASR
transcripts and manually transcribed ones were used as inputs for the
downstream dementia classification. We conducted comprehensive error analysis
to compare model performance and assess ASR-generated transcript effectiveness
in dementia classification.
\textbf{Results}: Imperfect ASR-generated transcripts surprisingly
outperformed manual transcription for distinguishing between individuals with
AD and those without in the ``Cookie Theft'' task. These ASR-based models
surpassed the previous state-of-the-art approach, indicating that ASR errors
may contain valuable cues related to dementia. The synergy between ASR and
classification models improved overall accuracy in dementia classification.
\textbf{Conclusion}: Imperfect ASR transcripts effectively capture linguistic
anomalies linked to dementia, improving accuracy in classification tasks. This
synergy between ASR and classification models underscores ASR's potential as a
valuable tool in assessing cognitive impairment and related clinical
applications.Comment: To appear on Journal of Biomedical Informatic
Enhancing clinical concept extraction with distributional semantics
AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data
TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments
The evidence is growing that machine and deep learning methods can learn the
subtle differences between the language produced by people with various forms
of cognitive impairment such as dementia and cognitively healthy individuals.
Valuable public data repositories such as TalkBank have made it possible for
researchers in the computational community to join forces and learn from each
other to make significant advances in this area. However, due to variability in
approaches and data selection strategies used by various researchers, results
obtained by different groups have been difficult to compare directly. In this
paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible
\textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage
\textbf{E}xperiments), an open source platform that focuses on two datasets
from the TalkBank repository with dementia detection as an illustrative domain.
Successfully deployed in the hackallenge (Hackathon/Challenge) of the
International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a
precise digital blueprint of the data pre-processing and selection strategies
that can be reused via TRESTLE by other researchers seeking comparable results
with their peers and current state-of-the-art (SOTA) approaches.Comment: Accepted at AMIA Informatics Summi
CELLS: A Parallel Corpus for Biomedical Lay Language Generation
Recent lay language generation systems have used Transformer models trained
on a parallel corpus to increase health information accessibility. However, the
applicability of these models is constrained by the limited size and topical
breadth of available corpora. We introduce CELLS, the largest (63k pairs) and
broadest-ranging (12 journals) parallel corpus for lay language generation. The
abstract and the corresponding lay language summary are written by domain
experts, assuring the quality of our dataset. Furthermore, qualitative
evaluation of expert-authored plain language summaries has revealed background
explanation as a key strategy to increase accessibility. Such explanation is
challenging for neural models to generate because it goes beyond simplification
by adding content absent from the source. We derive two specialized paired
corpora from CELLS to address key challenges in lay language generation:
generating background explanations and simplifying the original abstract. We
adopt retrieval-augmented models as an intuitive fit for the task of background
explanation generation, and show improvements in summary quality and simplicity
while maintaining factual correctness. Taken together, this work presents the
first comprehensive study of background explanation for lay language
generation, paving the path for disseminating scientific knowledge to a broader
audience. CELLS is publicly available at:
https://github.com/LinguisticAnomalies/pls_retrieval
Embedding Probabilities in Predication Space with Hermitian Holographic Reduced Representations
Abstract. Predication-based Semantic Indexing (PSI) is an approach to generating high-dimensional vector representations of concept-relation-concept triplets. In this paper, we develop a variant of PSI that accommodates estimation of the probability of encountering a particular predication (such as fluoxetine TREATS major depressive disorder) in a collection of predications concerning a concept of interest (such as major depressive disorder). PSI leverages reversible vector transformations provided by representational approaches known as Vector Symbolic Architectures (VSA). To embed probabilities we develop a novel VSA variant, Hermitian Holographic Reduced Representations, with improvements in predictive modeling experiments. The probabilistic interpretation this facilitates reveals previously unrecognized connections between PSI and quantum theory -perhaps most notably that PSI's estimation of relatedness across multiple reasoning pathways corresponds to the estimation of the probability of traversing indistinguishable pathways in accordance with the rules of quantum probability
EpiphaNet: An Interactive Tool to Support Biomedical Discoveries
Background. EpiphaNet (http://epiphanet.uth.tmc.edu) is an interactive knowledge discovery system, which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques. In this paper, we discuss the theoretical and methodological foundations of the system, and evaluate the utility of the models that underlie it for literature‐based discovery. In addition, we present a summary of results drawn from a qualitative analysis of over six hours of interaction with the system by basic medical scientists.
Results: The system is able to simulate open and closed discovery, and is shown to generate associations that are both surprising and interesting within the area of expertise of the researchers concerned.
Conclusions: EpiphaNet provides an interactive visual representation of associations between concepts, which is derived from distributional statistics drawn from across the spectrum of biomedical citations in MEDLINE. This tool is available online, providing biomedical scientists with the opportunity to identify and explore associations of interest to them
Students’ perceptions of school acoustics and the impact of noise on teaching and learning in secondary schools : findings of a questionnaire survey
This paper will present the design and findings of an online questionnaire survey of 11–16 year olds’ impressions of their school's acoustic environment, and of an experimental study into the effects of typical levels of classroom noise on adolescent's performance on numeracy and cognitive functioning tasks. Analysis of the responses to the questionnaire found that pupils who reported additional learning needs such as hearing impairment, speaking English as an additional language or receiving learning support reported being significantly more affected by poor school acoustics than pupils reporting no additional learning needs. Pupils attending suburban schools featuring cellular classrooms that were not exposed to a nearby noise sources were more positive about their school acoustics than pupils at schools with open plan classroom designs or attending schools that were exposed to external noise sources. The study demonstrates that adolescents are reliable judges of their school's acoustic environment, and have insight into the disruption to teaching and learning caused by poor listening conditions. Furthermore, pupils with additional learning needs are more at risk from the negative effects of poor school acoustics
Maternal obesity reduces placental autophagy marker expression in uncomplicated pregnancies
AIM: Obesity has been associated with changes in autophagy and its increasing prevalence among pregnant women is implicated in higher rates of placental-mediated complications of pregnancy such as pre-eclampsia and intrauterine growth restriction. Autophagy is involved in normal placentation, thus changes in autophagy may lead to impaired placental function and development. The aim of this study was to investigate the connection between obesity and autophagy in the placenta in otherwise uncomplicated pregnancies.
METHODS: Immunohistochemistry and western blot analysis were done on placental and omental samples from obese (body mass index [BMI] ≥30 kg/m
RESULTS: As pre-pregnancy BMI increased, there was an increase in both placental and fetal weight as well as decreased levels of LC3B in the central region of the placenta (P = 0.0046). Within the obese patient group, LC3B levels were significantly decreased in the placentas of male fetuses compared to females (P \u3c 0.0001). Adipocytes, compared to milky spots and vasculature, had lower levels of p62 (P = 0.0127) and LC3B (P = 0.003) in obese omenta and lower levels of LC3B in control omenta (P = 0.0071).
CONCLUSION: Obesity leads to reduced placental autophagy in uncomplicated pregnancies; thus, changes in autophagy may be involved in the underlying mechanisms of obesity-related placental diseases of pregnancy
- …